Learning with Annotation Noise

نویسندگان

  • Eyal Beigman
  • Beata Beigman Klebanov
چکیده

It is usually assumed that the kind of noise existing in annotated data is random classification noise. Yet there is evidence that differences between annotators are not always random attention slips but could result from different biases towards the classification categories, at least for the harder-to-decide cases. Under an annotation generation model that takes this into account, there is a hazard that some of the training instances are actually hard cases with unreliable annotations. We show that these are relatively unproblematic for an algorithm operating under the 0-1 loss model, whereas for the commonly used voted perceptron algorithm, hard training cases could result in incorrect prediction on the uncontroversial cases at test time.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effects of Multimedia Annotations on Iranian EFL Learners’ L2 Vocabulary Learning

In our modern technological world, Computer-Assisted Language learning (CALL) is a new realm towards learning a language in general, and learning L2 vocabulary in particular. It is assumed that the use of multimedia annotations promotes language learners’ vocabulary acquisition. Therefore, this study set out to investigate the effects of different multimedia annotations (still picture annotatio...

متن کامل

Multimedia Annotation: Comparability of Gloss Modalities and their Implications for Reading Comprehension

This study compared the effects of two annotation modalities on the reading comprehension of Iranian intermediate level EFL learners. The two experimental groups under study received treatment on 10 academic L2 reading passages under one of two conditions: One group received treatment on key words in the reading passages through a multimedia environment providing textual annotations. The second...

متن کامل

A CAD System Framework for the Automatic Diagnosis and Annotation of Histological and Bone Marrow Images

Due to ever increasing of medical images data in the world’s medical centers and recent developments in hardware and technology of medical imaging, necessity of medical data software analysis is needed. Equipping medical science with intelligent tools in diagnosis and treatment of illnesses has resulted in reduction of physicians’ errors and physical and financial damages. In this article we pr...

متن کامل

Taking the best from the Crowd: Learning Question Passage Classification from Noisy Data

In this paper, we propose methods to take into account the disagreement between crowd annotators as well as their skills for weighting instances in learning algorithms. The latter can thus better deal with noise in the annotation and produce higher accuracy. We created two passage reranking datasets: one with crowdsource platform, and the second with an expert who completely revised the crowd a...

متن کامل

The effects of traffic noise on memory and auditory-verbal learning in Persian language children

Background: Acoustic noise is one of the universal pollutants of modern society. Although the high level of noise adverse effects on human hearing has been known for many years, non-auditory effects of noise such as effects on cognition, learning, memory and reading, especially on children, have been less considered. Factors which have negative impact on these features can also have a negat...

متن کامل

Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria

Annotation acquisition is an essential step in training supervised classifiers. However, manual annotation is often time-consuming and expensive. The possibility of recruiting annotators through Internet services (e.g., Amazon Mechanic Turk) is an appealing option that allows multiple labeling tasks to be outsourced in bulk, typically with low overall costs and fast completion rates. In this pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009